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FUZZY  ROBUST  STATISTICS  FOR  APPLICATION  TO 
THE  FUZZY  C-MEANS  CLUSTERING  ALGORITHM 


1.  INTRODUCTION 

Clustering  is  an  important  tool  for  discovering  patterns  in  exploratory  data  analysis.  In 
pattern  recognition,  clustering  is  one  technique  used  before  desigiwg  a  classifier.  Clustering  is 
also  a  form  of  unsupervised  learning  hdpful  in  defining  the  rtiles  in  nizzy  system  design.  Fuzzy 
sets  are  use^  in  clustering  algorithm  smce  each  data  point  may  bdong  to  mote  dian  one  cluster 
at  the  same  time,  petmittin|  smoother  convergeitce  of  the  clustering  process.  The  result  of  a 
fuzzy  clustering  algorithm  h  a  fuzzy  partition  of  the  data  into  c  clas^.  Fuzzy  c-Means,  a 
powerful  clustering  algorithm,  illustrates  the  success  of  the  fuzzy  algorithms  that  have  emerged 
over  die  last  three  decades. 

Fuzzy  c-Means  is  a  generalization  of  the  hard  c-Means  algorithm.  Hard  c-Means  is  an 
iterative  procedure  that  assigns  a  class  to  each  point  based  on  the  closest  class  exemplar.  The 
class  assignment  forms  a  partition  of  the  data  set  and  thus  generates  equivalence  classes.  Fuzzy 
c-Means  generalizes  hard  c-Means  by  softening  class  membership  of  the  data  points  in  the  c 
classes,  butead  of  a  data  point  belonging  to  a  unique  class,  a  data  point  may  belong  to  all  of  the 
classes,  but  with  varying  degrees  of  membership.  So,  associated  with  each  data  point  is  a  fu^ 
unit  vector  (Ht  vector)  where  the  i-th  element  in  the  vector  is  the  membership  value  of  the  point 
in  the  i-di  class,  and  &e  sum  of  the  elements  of  the  vector  must  be  one.  In  fuzzy  c-Means,  the 
distances  to  the  class  exemplars  ate  used  to  modify  the  fit  vector  to  change  the  relative 
memberships  of  the  data  point  in  the  classes.  The  clustering  yields  a  fiiz^  partition  of  the  data. 

In  both  of  the  c-Means  clustering  algorithms,  the  class  exemplar  or  center  is  calculated 
using  linear  statistics.  For  hard  c-Means,  the  exemplar  is  a  strict  arithmetic  average;  for  fuzzy  c- 
Means,  it  is  a  weighted  average  in  which  the  weights  are  the  memberships  in  the  classes.  But 
linear  statistics  are  notoriously  vulnerable  to  outliers;  for  example,  only  one  bad  data  point  or 
outlier  can  destroy  the  sample  mean  as  a  measure  of  centrality  and  the  sample  variance  as  a 
measure  of  dispersion.  One  goal  of  robust  statistics  is  to  develop  statistics  that  are  more  resistant 
to  outliers.  Two  examples  of  these  statistics  ate  the  median  for  centering  the  data  and  the  median 
absolute  deviation  from  the  median  (MAD)  for  estimating  the  dispersion  of  tte  data.  However, 
these  statistics  have  nor  been  designed  to  woric  widi  fuzzy  sets.  Both  statistics  are  based  on 
order  statistics  that  do  not  inheiendy  take  into  account  membership  of  the  data  points  in  mote 
than  one  set  or  class. 

This  report  addresses  the  above  deficiency  by  introducing  a  fuzzy  median  and  a  fuzzy 
MAD  estimator.  When  these  statistics  are  used  with  the  fuzzy  c-Means  algorithm,  the  result  is  a 
more  robust  algorithm. 

In  section  2,  the  fuzzy  c-Means  algorithm  is  described  in  detail  Examples  show  how  this 
algorithm  works  well  on  light-tailed  clusters  but  not  on  heavy-tailed  clusters,  the  problem  lying 
with  linear  statistics  used  within  this  algorithm.  Section  3  shows  how  the  median  and  MAD 
estimators  are  "fuzzified”  so  that  they  can  be  ^plied  to  the  fuz^  c-Means  algorithm.  Section  4 
outlines  the  modified  fuzzy  c-Means  procedure  and  gives  an  example  of  how  it  works;  the 
mo^ed  ^gorithm  is  shown  to  cluster  data  sets  generated  by  hea\7-tailed  distributions  like  the 
Cauchy  and  the  Slash  distributions.  Section  5  summarizes  the  results  and  indicates  the  direction 
of  future  work  on  fuzzy  robust  statistics. 
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2.  FUZZY  C-MEANS  CLUSTERING  ALGORITHM 


Fuzzy  c-Means  is  a  practical  clustering  algorithm  that  generalizes  the  hard  c-Means 
algorithm.  The  latter  is  an  iterative  procedure  that  is  best  described  operationally.  Given  an 
initial  assignment  of  data  points  to  classes,  one  calculates  class  exemplars  by  averaging  the  data 
points  in  the  various  class^.  These  class  exemplars  are  then  used  to  assign  new  classes  by 
calculating  the  closest  exemplar  for  each  data  point  Data  points  assume  the  class  identity  of 
their  closest  exemplar.  This  procedure  is  repeated  until  some  form  of  convergence  occurs.  The 
algorithm  yields  a  partition  of  the  data  points  into  classes. 

Fuzzy  c-Means  generalizes  the  hard  c-Means  algorithm  by  replacing  the  class  assignment 
with  a  membership  vector  whose  elements  represent  the  member^p  of  the  data  points  in  each  of 
the  classes.  The  algorithm  produces  a  fuzzy  partition  of  the  data  into  c  classes,  i.e.,  each  point 
has  a  membership  vector  or  fit  vector  associated  it,  rather  than  a  single  class  assigiment  The 
algorithm  is  basically  an  unsupervised  learning  technique,  and  the  following  desmption  is  based 
on  Bezddc's  approach  to  fuzzy  pattern  recognition  (references  1-2). 


Consider  N  data  samples  forming  the  data  set  denoted  by  X  =  {x^,X2 . x^},  where  each 

sample  is  a  p-dimensional  real  vector,  x/  e  RP.  Assume  there  are  c  classes  and  = 
uiix^)  €  [0,1]  is  the  membership  of  the  k-th  sample  in  the  i-th  class.  This  leads  to  a  matrix 
representation  of  the  membership  function  associated  with  the  fuzzy  c-Partition  Mfc  defmed  as 


c  N 

^  €  '^cnl  f*ik  €  [0,1] V/,k;  5^11, -jt  =  1  Vk;  0  <  <  N  Vr 


where  Vcn  is  the  set  of  real  c  x  n  matrices  and  c  is  an  integer  such  that  2^c<N  (reference  1, 
p.  26).  Each  row  of  U  is  a  class  and  each  column  of  U  is  a  Et  vector  in  which  the  i-th  vector 
element  represents  the  membership  of  the  data  vector  in  the  i-th  class.  The  sum  of  the  elements 
in  any  column  must  be  one.  Thus,  each  column  describes  the  degree  that  each  point  belongs  to 
each  of  the  classes.  v  =  {v2,V2,...,Vc}  is  the  set  of  cluster  exemplars  or  prototypes  for  the  c 
clusters.  The  fuzzy  c-Means  algorithm  minimis  the  functional 

JW.y)=  f 

k=U=l 

using  the  following  algorithm  (found  in  reference  1): 

1.  Fix  c,  the  number  of  classes  such  that  c  €  1). 

Choose  an  inner  product  norm  metric  in  RP  and  fix  the  weighting  exponent  €  [l,<>o) . 
(The  c  in  refers  to  the  c-Means  algorithm  and  m  refers  to  the  median.) 

Initialize  the  membership  matrix  denoted  by  e  Afy^. 
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N 

2.  Construct  the  c  exemplars  by  a  weighted  average  v/  =  >  wtere  the  weights 

*=1 


Will  are  the  normalized  membership  functions  given  by 


/  y=i 

3.  Update  the  memberships  uih  in  the  membership  matrix  with 
uik  =  ' 


provided  of  course  that  none  of  the  are  zero.  In  the  latter  case,  the  un  are  assigned 
differently  (reference  1,  p.  66). 

4.  Compare  the  last  two  membership  matrices,  and  and  when  they  are 

sufficiently  close,  terminate  the  algorithm:  otherwise,  return  to  step  2. 

Note  that  in  step  2  the  exemplars  are  constructed  using  a  linear  combination  of  the  data  points 
where  the  weights  are  normalized  membership  Actions. 

The  distances  can  be  generalized  to  inclucte  the  covariance  of  each  cluster;  the  covariance 
is  a  fiizzy  version  as  estimated  by  Kessel  (reference  3)  where  the  i-th  class  covariance  matrix  is 
given  by  £|*  =  KSi^  where  K  is  related  to  a  volume  constraint  on  the  i-th  cluster  and  5/  is  the 
fuzzy  scatter  matrix  given  by 

N 

Si  =  “'V 

*=1 


The  constant  is  given  [pj  det(S|*)]^  where  p/  is  related  to  a  volume  constraint  on  the 

i-th  cluster  and  p  is  the  dimension  of  the  vector  space.  If  the  dispersion  of  die  cluster  is  used 
within  the  algoridim,  then  between  step  2  where  the  exemplars  are  updated  and  stq>  3  where  the 
memberships  are  up^ted,  the  fiizzy  covariances  must  be  estimated  since  the  distances  are  now 

deHned  by  (d/jfc)  =  (xj^  -  v- )  I.-  (x^^  -  v^. ).  The  fiizzy  scatter  matrix  is  also  a  lii^ar 

combination  of  the  out^  products  of  the  centered  data  vectors,  so  it  too  is  vulnerable  to  the 
presence  of  outliers. 

This  algorithm  with  the  covariance  modification  of  Kessel  was  ^plied  to  two  Gaussian 
clusters  located  at  antipodal  positions  as  Ulustitued  in  figure  1.  These  clusters  were  genen^ 
using  colters  at  (b,0)  and  (-b,0),  where  b  =  1.27,  and  an  identity  covariance  matrix.  The  value  of 
b  was  chosen  to  give  a  10-peicent  classification  error  using  a  Imear  discriminant  fimedon.  The 
fuzzy  c-Means  algorithm  was  run  for  20  iterations.  The  results  are  illustrated  in  figure  2  where 
the  padi  of  the  exemplar  solutions  produces  tracks  that  converge  toward  the  cluster  centers. 
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XAXIS 

F^ure  2.  Fuay  e-Means  Convergence  on  Two  Gaussian  Clusters 
Located  at  Ant^odal  Positions.  (The  two  distributions  used  to 
generate  the  random  deviates  are  N(-U7,l)  and  N(IJ7,J).) 
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The  same  algorithm  applied  to  two  Cauchy  clusters  placed  at  the  same  centers  does  not 
work  as  well  In  fact,  the  algorithm  does  not  appear  to  converge  after  20  iterations.  Figure  3 
shows  the  two  Cauchy  clusters  that  appear  to  be  more  dispersed,  although  benign.  Figure  3  is 
deceiving  as  not  all  the  data  points  are  shown  because  they  would  not  fit  on  this  scale.  A  small 
percentage  of  the  points  ate  extreme  outliers,  as  is  typical  of  clusters  generated  using  a  Cauchy 
distribution  fimction.  In  fact,  the  Cauchy  distribution  does  not  technically  possess  any  moments. 

For  the  p-th  moment  of  any  distribution  to  be  defined,  one  must  have  <  <»,  which  is  not  the 
case  for  any  p'tl.  The  outliers  destroy  the  exemplar  estimation  in  step  2  of  the  algorithm,  as 
well  as  the  covariance  estimation,  which  is  dependent  on  the  results  of  step  2.  This  is  precisely 
the  reason  for  developing  the  fiizzy  median  and  the  fuzzy  MAD. 
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Figure  3.  Scatter  Diagram  for  the  Cauchy  Clusters  Centered  at  (•1.27,  0,00) 
and  (127, 0.00)  (Not  all  the  points  are  shown  because  of  the  extreme 
outliers  caused  by  this  heavy-tailed  distribution.) 

It  is  easy  enough  to  say  that  one  should  first  remove  these  outliers  so  they  would  be  no 
problem.  In  the  best  of  all  worlds  this  is  a  possibility,  but  it  is  not  easily  done,  especially  in  high 
dimensional  data.  One  would  need  to  run  other  clustering  algorithms  or  use  nonparametric 
statistics  to  calculate  the  cluster  centers  and  then  "peel  ofT  the  outliers.  A  better  solution  is  to 
use  a  robust  method  that  is  resistant  to  outliers  so  that  the  extensive  exploratory  data  analysis 
could  be  eliminated.  In  automated  decision  syst^s,  this  is  a  necessity.  The  approach  is  to 
replace  the  centering  and  dispersion  estimates  with  some  robust  counterpart  resistant  to  outliers. 
However,  these  statistical  counterparts  have  to  be  compatible  with  the  philosophy  of  fuzzy 
clustering,  which  allows  each  data  point  to  be  a  member  of  all  the  clusters  in  differing  degrees. 
Such  fuzzy  robust  statistics  are  developed  in  the  next  section. 
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3.  FUZZY  ROBUST  STATISTICS 


Robust  statistics  is  an  area  of  study  that  deals  with  variations  from  ideal  assumptions 
(references  4-7).  This  study  area  includes  the  design  of  statistics  that  are  resistant  to  outliers. 
These  statistics  are  sometimes  evaluated  by  the  percentage  of  outliers  that  must  be  present  before 
the  statistic  no  longer  gives  a  meaningful  estimate  of  the  desired  quantity.  One  example  of  such 
a  statistic  is  the  median,  which  can  include  almost  SO-percent  outlWs  before  it  loses  its  ability  to 
measure  the  center  of  the  data  sample.  A  second  example  is  the  MAD  estimate,  which  measures 
the  dispersion  of  the  data  sample.  It  too  has  a  high  bre^own  point  In  this  section,  the  median 
and  the  MAD  are  defined  and  the  source  of  their  resistant  behavior  is  explained.  Then, 
alternative  definitions  are  given  that  can  be  generalized  for  application  to  fuzzy  data  sets. 
Throughout  this  section,  the  data  samples  are  assumed  to  be  one  dimensional  When  the  median 
and  MAD  statistics  are  applied  to  higher  dimensional  samples,  it  is  on  a  component-by- 
component  basis. 

Suppose  the  data  set  is  X  =  {xj,X2,...,Xj^},where  each  element  x/  is  a  p-dimensional 
vector.  Ifp  =  l.  so  that  the  samples  are  real  numbers,  then  the  median  of  X  is  defined  in  terms  of 
the  order  statistics.  The  ordered  N-sample  is  where  by  definition 

^  x(2)  x(  and  the  subscript  (i)  means  that  the  original  data  have  been  relabeled  or 

permuted  so  the  sample  set  is  ordered.  Then,  the  median  of  X  is  defined  to  be  if 

N  =  21 +  \  and  +  x(/+i)]  12  if  N  =  21.  If  p  >  1,  the  definition  is  applied  to  each  dimension 

of  the  sample  and  the  median  vector  is  constructed  from  the  vector  of  individual  medians.  Since 
the  median  is  defined  in  terms  of  the  ordered  sample,  it  is  an  order  statistic  and  there  appears  to 
be  no  way  to  extend  it  to  a  fuzzy  set  In  one  dimension,  the  median  represents  the  hal^ay  point 
of  the  sample  set  having  an  equal  number  of  samples  smaller  and  larger  than  itself.  This 
interpretation  explains  why  almost  half  of  the  data  points  must  be  outliers  before  the  median 
loses  its  effectiveness  as  a  measure  of  centrality.  Half  of  the  points  to  the  left  say,  must  be 
outliers  before  the  median  is  pulled  off  to  the  left  In  fact  the  finite  breakdown  point  of  the 
median  is  one-half  (reference  4,  p.  99). 

The  robust  estimator  of  dispersion,  the  MAD,  is  also  an  order  statistic.  This  statistic  is 
defined  as  the  median  of  the  absolute  deviations  from  the  median.  To  construct  this  statistic,  one 
takes  the  samples  X  =  and  constructs  another  data  set 

y  =  { |xj  -  med{X^\x2  - med{Xl^...,\xf^  -  med{X^  }, 

finds  the  median  of  this  set,  and  then  scales  it  For  this  report,  the  MAD  is  defined  as  mad{X)  = 
med{Y)  /  0.6745,  where  the  constant  0.6745  adjusts  the  dispersion  measure  to  be  1  when  the 
sample  is  Gaussian  with  unit  variance.  Intuitively,  one  folds  the  data  about  the  mediX)  and 
finds  the  median  of  the  set  of  positive  deviations  about  the  median.  The  breakdown  point  of  the 
MAD  is  also  one-half  (reference  5,  p.  107). 

Although  the  median  and  the  MAD  are  resistant  to  outliers,  they  are  constructed  on  crisp 
sets,  i.e.,  the  set  of  data  points  X  and  the  set  of  absolute  deviations  about  the  mertian  Y  are  crisp 
sets.  The  linear  order  of  the  real  numbers  does  not  take  into  account  the  membership  of  the 
points  in  these  sets.  However,  if  these  statistics  are  reformulated,  the  membersUp  of  the  samples 
in  the  sets  can  be  taken  into  account  This  is  accomplished  by  using  another  definition  of  the 
median  that  does  not  depend  on  the  linear  ordering  of  the  samples.  The  median  can  be  defined  as 
the  solution  m  of  (reference  8,  p.  233) 
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N 

XI** 'H- 

This  can  be  seen  most  easily  by  taking  the  derivative  with  respect  to  m,  so  one  wants 
N 

'^sgn{xk~m)  =  0. 

For  ^  =  2/+ 1.  the  derivative  is  zero  at  the  point  m  =  x(f+i),  and  for  s  2/.  the  derivative  is 
zero  for  any  m  €  (x(/),jc(/+i)).  This  definition  is  amenable  to  generalization  to  fiizzy  sets.  For 
the  c  class  problem,  if  u/j^  is  the  membership  of  in  class  i,  then  m,*  is  given  by 

N 

min 

and  the  resulting  statistic  mi  has  the  characteristics  of  a  median  and  applies  to  the  fuzzy  set  X/  = 
niiAi  +«i2/-*2  +  *" •  Fnzzy  sets  are  defined  using  a "+"  sign  to  link  the  elements 

uiklxi  of  the  set,  where  uik  is  the  membership  grade  and  x/  is  the  element  This  is  precisely  the 
set  that  is  used  to  update  the  exemplars  in  the  fuzzy  c-Means  algorithm.  The  derivative  also 
exists  and  is  given  by 

N 

'^uiksgnixk  -mi). 
k=l 

Numerically  solving  for  the  root  of  this  fimctional  using  bisection  yields  both  the  root  and  the 
fuzzy  median  estimate  of  the  set 

The  limiting  cases  of  this  statistic  aie  consistent  If  =  n/ (x;^)  =  1,  V* ,  then  the 

definition  reverts  to  the  standard  median.  For  the  two-class  problem,  if  N  =  2/  and  the  first 
N  /2  samples  are  from  class  1  and  the  second  N 12  ait  from  class  2,  then 

ri,  k^N/2  ro,  k^N/2 

“U=V^t)  =  (o,t>A,/2  • 

The  resulting  fuzzy  medians  reduce  to  the  crisp  medians  on  bodi  of  tl»  crisp  subsets  associated 
with  die  cla^.  The  fuzzy  median  is  a  measure  of  central  tendency  that  alk>  reflects  the 
membership  of  the  sample  points  in  the  fuzzy  set  and  that  reduces  to  the  crisp  median  where 
appropriate. 

The  MAD  estimate  can  also  be  reformulated  in  the  functional  form 
N 


and  the  resulting  MAD  estimate  is  mad  =t}/  0.6745.  Note  that  this  requires  that  the  median  m 
be  known  beforehand.  For  a  fuzzy  data  set,  the  median  does  not  exist;  however,  the  fuzzy 
median  does.  So  one  can  define  tte  fuzzy  MAD  recursively  for  the  k-th  fuzzy  data  set  by 
assuming  that  the  fuzzy  median  exists.  This  functional  definition  is  given  by 

N 

min 

n, •€/?*=! 


and  the  fuzzy  MAD  is  given  by  fuzmadi  =  r\i  /  0.6745.  From  an  implementation  point  of  view, 
the  process  is  somewhat  recursive  since  one  first  forms  the  fuzzy  median  mi ,  uses  this  to 
construct  a  new  fuzzy  data  set 

^i  =  “  mii+ Ui2/ljC2  ■  "  "^‘1’ 

finds  the  fuzzy  median  on  this  set,  and  then  scales  it  To  apply  these  statistics  on  a  p-dimensional 
space,  one  has  to  apply  them  on  each  component  separately. 

Although  this  approach  is  a  simple  modification  of  these  statistics,  it  is  important  to  note 
that  it  allows  the  application  of  robust  statistics  to  fiiz^  algorithms.  The  median  is  an  M- 
estimator  or  Huber's  generalization  of  die  maximum  lilrelihood  estimator.  Many  times  these 
estimators  m  are  formulated  implicidy  by  specifying  functionals  of  the  form 

N 

-  m), 
k=\ 

where  p  satisfies  certain  boundary,  symmetry,  and  non-negativity  conditions.  It  would  seem 
that  all  these  functionals  could  be  weighted  with  the  tqipropriate  membership  functions,  thus 
allowing  this  whole  class  of  estimators  to  be  applied  to  fuzzy  algorithm  development. 


4.  MODIFIED  FUZZY  C-MEANS  ALGORITHM 

At  each  iteration,  the  fuzzy  c-Means  algorithm  depends  on  two  estimates  made  on  the 
fuzzy  subsets  associated  with  the  c  classes.  In  step  2,  a  linear  combination  of  data  points  is  used 
to  estimate  the  exemplars  of  the  fuzzy  classes,  hi  step  3,  the  new  membership  values  are 
estimated  based  on  the  distance  to  the  exemplars,  which  are  normalized  with  the  inveise  of  the 
fuzzy  covariance  matrix  of  the  data  sets.  These  ^ear  statistics  are  now  replaced  by  the 
corresponding  fuzzy  robust  statistics.  Thus,  for  each  class,  the  sample  mean  is  replaced  by  the 
fuzzy  median.  The  covariance  matrix  is  replaced  by  a  diagonal  approximation  of  Z  = 

2  2 

diagimad  {Xi),...,mad  {Xp)) ,  a  simple  yet  effective  approximation. 

The  robust  fuzzy  algorithm  is  then  stated  as  follows: 

1.  Fix  c,  the  number  of  classes  such  that  c  e  {2,. ..,  N  - 1} . 

Choose  an  inner  product  norm  metric  in  rP  and  fix  the  weighting  exponent  m^  e  [l,oo). 

(The  c  in  refers  to  the  c-Means  algorithm  and  m  refers  to  the  median.) 
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Initialize  the  membership  matrix  denoted  by  e  MfQ. 

2.  Foreachclass  1  =  1... ..c.andforeachcomponentofthedata vector 7  =  l,...,p,solve 
for  mij: 

N 

'^uiksgn{xkj-mij)  =  0, 

k=l 

where  x^j  is  the  j-th  component  of  the  k-th  sample  vector  and  n^j  is  the  j-th 
component  of  the  fuzzy  median  for  the  i-th  cluster.  The  new  exemplars  are  v/  =  m,- . 

3.  For  each  class  i  =  1,. .  .,c,  form  the  new  fuzzy  vector  set 

“ilM  -  '"il + “12/1^  "  • 

and  for  each  component  of  the  data  vector  j  =  solve  for  r^y: 

N 

I  -  Hiy ) = 0 , 
ik=l 


and  then  scale  rj|y  <-  77, y  /  0.6745  to  obtain  the  fuzzy  MAD  estimator. 

Fonn  the  class  covariance  matrix  from  Z-  ~  diagirn^,...,  Tjfp). 

2  t  —1 

4.  Update  the  memberships  uik  using  (dik)  =  -  v- )  Z-  {xj^  -  v^. )  in 


provided  of  course  that  none  of  the  djk  are  zero.  In  the  latter  case,  the  uik  are 
assigned  differently  (reference  1,  p.  ^). 

5.  Compare  the  last  two  member^p  matrices,  and  and  when  they  are 

sufficiently  close,  terminate  the  algorithm;  otherwise,  return  to  step  2. 

Figure  4  shows  the  exemplar  tracks  for  the  same  Gaussian  clusters  tised  to  illustrate  the 
fuzzy  c-Means  algorithm  in  figure  2.  Note  that  the  exemplar  tracks  again  converge  on  the  cluster 
centers,  but  by  a  somewhat  more  straightlined  trajectory.  So,  for  the  light-tailed  ^tributions, 
the  convergence  is  as  expected  with  the  fuzzy  c-Means  algorithm.  Figure  5  shows  similar 
exemplar  trajectories  su^iimposed  on  the  Cauchy  clusters  produced  by  the  robust  fuzzy  c- 
Means  algorithm.  The  corres^nding  trajectories  for  fuzzy  c-Means  do  not  exist,  since  one  of 
the  exemplars  simply  diverged.  The  divergence  was  caus^  by  the  large  outliers  destroying  the 
sample  mean  and  covariance  estimates. 
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Figure  4.  Robust  Fuzzy  c-Means  Convergence  on  Two  Gaussian  Clusters  Located 
at  Antipodal  Positions  (The  two  dist/^utions  used  to  generate  ti$e  random 
deviates  are  N(-L27J)  and  N(1J7,I).) 


Figure  5.  Robust  Fuzzy  c-Means  Convergence  on  Two  Cauchy  Oustas  Located  at 
Antipodal  Positions  (Hie  points  shown  in  the  figure  do  notindudethe  ouUiers 
caused  by  the  heavy-tailed  Sstribudon,) 
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A  second  illustration  of  the  same  observed  behavior  is  with  the  Slash  distribution,  which 
is  deHned  as  the  ratio  of  a  Gaussian  to  a  uniform  distribution  (reference  7).  The  sample  set 
illustrated  in  figure  6  was  generated  using  the  ratio  of  N(0.0,0.S)  and  uniform  [0,1]  random 
deviates.  Again,  the  robust  fuzzy  c-Means  algorithm  converges  nicely,  but  the  standard  fuzzy  c- 
Means  does  not 
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5.  CONCLUSIONS 


Before  order  statistics  can  be  applied  tt)  fuzzy  data  sets,  they  must  be  generalized  to  take 
into  account  the  member^p  of  the  data  points  in  the  fuzzy  set  This  was  done  by  properly 
weighting  the  functionals  that  generate  tte  statistics  with  ^  membership  values  of  die  data 
points.  Two  such  statistics  were  developed,  the  fuzzy  median  and  the  fu^  MAD.  The  fuzzy 
median  can  replace  the  weighted  average  and  the  fuzzy  MAD  can  replace  die  standard  deviation. 
The  fuzzy  MAD  can  also  be  used  to  generate  an  approximate  replacement  for  the  covariance 
matrix.  Both  statistics  were  applied  within  the  fvaiy  c-Means  clustering  algorithm  and  shown 
for  two  examples  to  stabilize  Ae  convergence. 

The  approach  used  to  "fuzzify"  robust  statistics  is  currently  being  applied  to  more 
sophisticated  robust  estimates  of  location  and  scale  and  the  results  will  be  reported  when 
available.  The  replacement  of  linear  estimates  by  more  resistant  fuzzy  estimators  in  any  fiizzy 
algorithm  should  help  to  make  that  algorithm  more  robust 
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